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Abstract 


\ 

v/ 

We  treat  the  ’approximately’  optimal  control  problem  for  tandem  queueing 
or  production  networks  (with  local  feedback  allowed)  under  heavy  traffic.  The 
buffers  (scaled  with  traffic)  are  finite.  The  controls  allow  various  inputs, 
connecting  links  and  the  processors  to  be  shut  down  or  opened,  in  order  to 
manage  the  system.  The  service  and  arrival  rates,  as  well  as  the  routing 
probabilities  can  also  be  controlled,  and  the  system  statistics  can  depend  on  the 
system  state  (scaled  buffer  occupancies).  The  associated  costs  involve  holding 
costs,  costs  for  shutting  off/on  the  links  or  processors  and  the  opportunity  cost 
for  lost  production.  It  is  shown  that  the  (scaled)  controlled  system  converges 
weakly  (in  an  appropriate  sense)  to  a  controlled  limit  ’reflected’  diffusion.  In 
the  rescaled  time,  the  actions  of  the  controllers  lead  to  multiple  ’simultaneous’ 
impulses  in  the  limit  problem.  Thus  we  have  a  non-standard  limit  control 
problem,  and  the  usual  methods  of  weak  convergence  for  systems  under  heavy 
traffic  must  be  modified.  Since  the  optimal  or  nearly  optimal  controls  for  the 
physical  process  are  usually  not  possible  to  get,  it  is  of  considerable  interest  to 
know  whether  an  optimal  or  nearly  optimal  control  for  the  limit  process  is  also 
nearly  optimal  for  the  physical  system  with  heavy  traffic.  This  is  shown  to  be 
true,  under  reasonable  conditions.  Although  the  limit  control  problem  is 
non-standard  and  there  is  little  available  theory  concerning  it,  acceptable 
numerical  procedures  are  available.  •  *- 

Key  Words:  Weak  convergence,  queueing  networks,  production  networks,  heavy 
traffic  approximations,  controlled  reflected  diffusions,  controlled  queueing 
networks,  approximately  optimal  stochastic  controls,  -numerical  methods  for 
stochastic  control;  f- — 


I.  Introduction 


We  consider  optimal  and  ’nearly  optimal’  control  problems  for  the  open 

queueing  networks  in  heavy  traffic  of  the  type  dealt  with  in  the  fundamental 

papers  of  Reiman  [1]  and  Harrison .  [2],  [3],  Owing  to  the  state  and  control 
dependence  (in  our  problem)  of  the  routing,  arrival  and  service  time  processes, 
as  well  as  to  our  use  of  finite  buffers,  and  to  some  approximations  which  are 
used  in  [1]  -  [3]  in  the  modelling  in  these  papers,  much  of  their  methodology 

cannot  be  carried  over.  We  do  try  to  retain  their  structure  and  results 

wherever  possible.  One  of  the  main  motivations  behind  the  heavy  traffic 
approximations  [1]  -  [4]  of  queueing  networks  is  the  idea  that  the  limit  process 
(which  is  a  reflected  Brownian  motion  in  the  past  work,  and  a  more  general 
impulsively  or  singularly  controlled  reflected  diffusion  here)  is  easier  to  analyze 
than  the  actual  physical  process,  and  that  it  is  much  easier  to  find  good  or 

optimal  control  policies  for  the  limit  than  for  the  physical  process.  This  is 

undoubtably  true,  particularly  if  the  traffic  is  truly  heavy  the  buffer  size  large 
or  if  the  routing  parameters  and  input  and  service  times  are  correlated  or  state 
(queue  size)  dependent. 

In  [1],  one  has  several  interconnected  service  or  processing  stations,  and  at 
each  there  is  an  infinite  buffer  (ours  is  finite,  but  suitably  scaled  with  traffic 

intensity).  At  each  there  are  possible  arrivals  from  outside  the  network  as  well 

as  arrivals  routed  from  other  service  stations.  The  departures  are  routed 
(perhaps  randomly)  to  other  service  stations  (perhaps  to  one  that  they  had 
previously  visited)  or  leave  the  network.  Eventually  (w.p.l)  all  customers  leave 


the  network.  Under  reasonable  conditions  on  the  interarrival  and  service  times 


and  with  appropriate  spacial  and  temporal  normalizations,  in  the  heavy  traffic 
case  the  vector  of  the  normalized  queue  lengths  (the  normalized  number  in  the 
buffers  plus  in  service)  converges  weakly  to  a  reflected  Brownian  motion  with 
constant  drift  and  covariance  parameters  [1],  This  will  be  generalized  here  in 
several  directions,  although  we  work  with  a  somewhat  simpler  network 
structure. 

Although  it  underlies  a  lot  of  the  motivation  for  the  limit  theorems,  there 

seems  to  have  been  very  little  work  on  the  usefulness  of  the  limit  process  for 

purposes  of  getting  a  good  or  nearly  optimum  control  for  the  physical  process. 

Let  c  index  the  traffic  intensity.  As  £  -  0,  the  ’intensity’  goes  to  ».  For 

whatever  cost  criteria  is  used  (this  will  be  defined  in  later  sections),  let  V£(rr) 

denote  its  value  for  the  physical  system  when  a  policy  n  is  used.  Suppose 

that  77€  is  an  ’adaptation’  of  the  optimal  (or  6-optimaI)  policy  for  the  limit, 

applied  to  the  physical  process.  (We  will  say  more  about  such  adaptations  later.) 

For  rr£  to  be  a  ’good’  policy  for  the  physical  process  we  need  at  least  that 

V£(77£)  -  inf  V£(n)  be  small  for  small  £,  where  the  inf  is  over  an 

n 

appropriate  set  of  policies  for  the  physical  process.  This  is  the  problem 
addressed  here.  In  the  course  of  the  development,  a  number  of  interesting  and 
non-classical  problems  arise;  for  example,  the  appropriate  ’limit’  control  problem 
might  involve  multiple  ’simultaneous’  impulses,  and  we  must  treat  state 
dependent  service,  arrival  and  routing  processes. 

There  are  many  possibilities  for  the  structure  of  the  control  problem.  Ours, 
to  be  described  below,  illustrates  the  main  problems  and  develops  a  (weak 
convergence  based)  method  which  applies  to  many  other  formulations.  We  arc 
forced  to  differ  in  several  important  respects  from  models  used  in  earlier  work 


on  the  limit  theorems  for  queueing  networks  in  heavy  traffic.  If  the  service  or 
arrival  rates  can  be  controlled,  then  the  limit  process  is  no  longer  a  reflected 
Brownian  motion  with  constant  coefficients;  we  wish  to  allow  these  rates  to 
depend  on  the  system  state;  we  must  deal  with  (implicitly  or  explicitly)  a 
dynamically  controlled  upper  bound  to  the  buffer  size.  (Even  if  the  buffer  size 
is  infinite,  the  optimal  control  might  force  it  to  be  shut  down);  owing  to  the 
control,  there  might  be  ’travel’  along  the  boundaries;  some  controls  (e.g.,  on/off 
controls  with  associated  impulsive  costs)  might  yield  nice  process  paths  in  ’real’ 
time  but  in  the  usual  interpolated  time  (i.e.,  for  the  sequence  for  which  we  seek 
the  weak  convergence)  the  paths  between  the  on/off  times  move  faster  and 
faster  as  €  -*  0  and  converge  to  a  discontinuity  -  but  not  in  the  Skorohod 
topology;  the  nature  of  the  convergence  at  these  discontinuities  can  yield  (an 
interesting)  limit  process  with  ’multiple  simultaneous  impulses’;  the  lumping 
together  of  all  idle  times  as  done  in  [1,  eqn(3)]  in  the  Bk(t)  argument  is  a 
useful  ’approximation’,  but  it  is  inappropriate  in  our  context  owing  to  the  state 
and  control  dependencies,  and  is  not  quite  the  exact  physical  model  in  any  case 
(although  it  yields  the  correct  results);  to  show  that  the  ’limit’  controls  and 
other  quantities  are  ’admissible’,  or  non-anticipative  with  respect  to  the  limit 
Brownian  motions  or  reflected  diffusions,  we  need  an  approach  that  is  at  least 
partly  along  the  lines  of  the  martingale  method.  In  fact,  we  combine  the  ideas 
of  [1]  with  those  of  the  martingale  method  and  the  weak  convergence  techniques 
of  [5],  [6], 

The  work  here  is  a  continuation  of  the  lines  of  development  in  [6],  [7],  [8] 
where  approximations  to  other  optimal  control  problems  are  dealt  with.  Owing 
to  the  special  features  of  the  controlled  heavy  traffic  network  of  queues,  this 


past  work  is  not  applicable  to  this  problem  without  major  change.  We  refer  to 
it  where  helpful  in  simplifying  or  reducing  an  argument. 

In  Section  2,  the  basic  system  is  described,  the  control  problem  defined 
and  assumptions  stated.  Many  of  the  results  are  true  for  controlled  networks 
allowing  general  feedback  as  in  [1],  But,  in  order  to  avoid  some  quite 
complicated  bookkeeping,  we  eventually  specialize  to  a  tandem  case  -  with  only 
two  processors  and  feedback  only  allowed  from  a  processor  to  itself.  The 
general  results  can  be  readily  extended  to  problems  where  (except  for  the 
possibility  of  rerouting  an  output  back  to  the  input  of  the  same  processor),  the 
flow  is  all  ’forward’.  In  Section  3,  we  discuss  representations  for  the  processes 
which  facilitate  the  weak  convergence  analysis,  and  in  Section  4,  we  describe 
the  proper  ’limit’  control  problem  (and  some  of  its  peculiarities),  i.e.,  the 
appropriate  controlled  reflected  diffusion  whose  optimal  (or  6-optimal)  controls 
aTC  to  be  used  for  the  physical  process. 

Section  5  contains  the  basic  weak  convergence  results,  and  we  state  and 
prove  the  results  concerning  the  ’almost  optimality’  of  the  6-optimal  (for  small 
8)  controls  for  the  limit  process,  when  applied  to  the  physical  process.  Some 
computational  questions  are  discussed  in  Section  6.  Although  the  ’limit’  control 
problem  is  not  always  simple,  effective  and  convenient  numerical  methods  are 


2.  Problem  Description  and  Assumptions 


We  start  by  describing  a  network  with  K  service  stations  (processors), 
the  ith  referred  to  as  P;.  Each  processor  services  only  one  customer  at  a  time 
(although,  as  will  be  seen  from  the  development  in  the  sequel,  batch  or  multi 
server  cases  can  all  be  handled  and  even  controlled.  Shortly,  we  specialize  to 
the  case  K  =  2,  but  it  is  simpler  to  first  use  a  unified  terminology.  We  retain 
the  basic  interconnection  structure  of  [1],  but  use  a  discrete  time  parameter  for 
notational  simplicity.  Each  processor  can  be  connected  to  an  external  input  as 
well  as  receive  (and  deliver)  outputs  from  (to)  other  processors. 

Let  {°4'f}  denote  the  sequence  of  intcrarrival  times  of  the  customers 

coming  from  the  exterior  of  the  network  directly  to  Pjt  and  let  denote 

the  indicator  of  the  event  that  there  was  an  arrival  from  the  exterior  to  P;  at 
time  n.  As  is  frequently  done  (e.g.,  as  in  [1]),  we  adapt  the  convenient 
representation  where  the  processor  keeps  processing  even  if  the  queue  is  empty, 
with  the  ’errors’  generated  by  this  convention  accounted  for  by  an  added 
reflection  term.  With  this  convention  in  mind,  let  { ■ € >  denote  the  sequence 
of  service  times  for  Pj,  and  <^n,£  the  indicator  of  the  event  that  a  service  at 
Pj  is  completed  at  time  n  (whether  or  not  there  are  actual  ’physical’ 
customers  in  P;  at  that  time).  As  in  [1J,  we  suppose  that  if  there  is  an 
arrival  to  P,  in  the  midst  of  a  service  interval  when  the  queue  at  P;  is 
empty,  then  the  actual  service  time  for  that  customer  is  just  the  residual  service 
time  for  the  current  service  interval.  Under  the  heavy  traffic  assumption,  this 
does  not  affect  the  limit  formulas.  Let  1^’*,  i  =  1,  •  •  •,  K.,  j  =  0,  •  •,  K,  denote 
the  indicator  function  of  the  event  that  a  completed  service  at  Pi  at  time  n 
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is  scheduled  to  be  sent  to  P-  (or  to  the  exterior,  if  j  =  0).  We  use  {p^,  ij  = 

1,-  -,K}  to  denote  the  probability  that  a  completed  service  from  Pt  is  to  be 

K 

routed  to  Pj,  and  write  pi0  =  1  -  T  p;j.  The  buffer  size  at  P;  is  Bj/yr,  for 

j=i 

B;  >  0. 

The  allowable  control  efforts  are  as  follows.  We  work  with  impulsive 
controls  only,  although  the  results  can  be  extended  to  the  case  where  the  service 
and  interarrival  ’rates’  as  well  as  the  routing  probabilities  are  controlled 

contiuously.  The  processor  Pj  can  be  shut  off  for  a  time,  at  a  cost  k(  >  0, 
to  be  paid  at  the  moment  of  shut  off.  The  external  inputs  to  P;  can  be  shut 
off  for  a  time,  at  a  cost  kQi  >  0,  to  be  paid  at  the  moment  of  shut  off.  If  Pj 
communicates  to  P-,  in  lieu  of  shutting  Pj  off,  we  can  open  or  break  the 

link  connecting  P;  to  P^  In  that  case  the  output  of  P;  which  is  destined 
for  Pj  will  be  shunted  to  the  exterior  and  lost,  or  sold  as  a  ’partially 
completed’  product.  The  cost  for  shutting  the  link  off  is  k;j  >  0,  to  be  paid  at 
the  moment  of  shut  off ,  and  there  will  be  an  additional  cost  for  the  lost 
customers.  This  cost  is  qjj/r  per  lost  customer,  q;j  >  0.  By  convention,  we 
allow  all  customers  in  Pj  who  have  completed  service  there  and  are  destined 
to  return  to  P;  immediately  to  do  so.  If  the  buffer  of  P;  is  full,  then  one 
or  more  inputs  must  be  turned  off,  i.e.,  either  the  input  links  to  Pj  are 

shunted  to  the  exterior,  or  the  Pj  connecting  to  Pj  are  shut  off. 

The  bulk  of  the  work  will  use  the  above  control  possibilities.  The 
extension  to  the  case  where  the  marginal  service  or  external  arrival  rates  (or 

even  the  routing  probabilities)  are  controlled  is  not  a  difficult  extension  and  is 
discussed  at  the  end  of  the  paper. 

Let  P1,€,  P0|'€  and  P^l,€,  resp.,  denote  the  indicators  of  the  events  that 


Pj  is  working  at  time  n  (i.e.,  processing  or  not  shut  off),  the  external  input 
to  Pj  is  not  shut  off  at  time  n,  and  the  link  connecting  P;  to  Pi  is 
open  at  time  n,  resp.  Let  (resp.,  NJ^*)  denote  the  nth  time  that  P(  is 

turned  off  (resp.,  turned  back  on),  and  set  Nq,€  =  0.  Let  N'nJ'f  (i  =  0.1. 

,K,  j  =  1, •  • -,K)  (resp.,  N^,f)  denote  the  nth  time  that  the  link  connecting 
Pj  to  Pj  is  shut  off  (turned  back  on,  resp.)  (If  i  =  0,  then  it’s  for  the  link 
connecting  the  exterior  to  Pj.)  Define  v^’€  =  and 

similarly  define  v‘ ,£  and  v1J,£. 

Let  X^,€  =  vr  [Number  of  customers  in  or  waiting  for  service  at  Pi  at 
time  n]  and  set  Xi,€(t)  =  X‘ty£.  This  is  the  quantity  of  interest  in  the  desired 
interpolated  time  and  amplitude  scale.  Then,  in  this  interpolated  scale, 
[v'/,v-'€),  n  *  L  etc.,  are  the  intervals  of  closure  of  Pj,  etc.  When  ratios  t/e 
are  used  as  indices,  we  use  the  integral  part.  Until  Sections  5  and  6,  w.l.o.g., 
and  for  notational  convenience,  we  always  assume  that  all  processors  and  links 
are  working  at  t  =  0.  Thus  v®  s  0  and  v®€  >  v“'e  for  n  >  0.  In 

general,  it  is  possible  that  v“,€  =  0  also  (instantaneous  change  in  the  system 
at  the  starting  time).  The  optimal  value  function  will  depend  on  the  initial 

system  configuration,  and  the  true  state  of  the  system  is  actually  the  pair  (X£, 
status  of  links  and  processors).  We  return  to  this  in  Section  5. 

In  order  to  keep  track  of  the  flows  in  the  system  for  purposes  of  the 

control  problem  and  the  limit  theorems,  we  need  to  separate  out  the  corrections 
to  the  flows  due  to  empty  queues  and  to  the  flow  components  due  to  the 

control  actions.  Throughout  the  paper,  t-superscripts  will  be  omitted  in  the  terms 
in  sums  or  integrals.  The  subscript  c  is  for  ’combined’,  since  wx  use  it  when 
there  is  a  condition  on  the  status  of  two  controls  simultaneously.  Define 
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Y^1,€(  )  corrects  for  arrivals  to  P(  from  Pj  when  the  buffer  of  P i  is 

empty  and  neither  P.  nor  the  link  from  Pj  to  P;  is  shut  off.  The 

U0|,£()  corrects  for  the  stopped  external  arrivals,  when  the  input  to  Pj  from 
the  exterior  is  shut  off.  The  U'-i,€(  )  corrects  for  the  stopped  departures  from 

Pj  when  Pj  is  closed,  and  the  Ujl,€(-)  corrects  for  the  stopped  arrivals 

from  P.  to  Pj  when  either  P.  is  not  working  or  the  link  from  Pj  to  Pj 

is  shut  off  (i.e.,  shunted  to  the  exterior). 

The  Z‘j,e(  )  represents  the  lost  output  when  the  link  from  P(  to  P 

is  shunted  to  the  exterior.  There  can  only  be  lost  output  at  time  n  if 

Xj,-£  >  0  and  Pj/  =  1  (as  well  as  P^£  -  0).  Write  X£  -  (X1£, •  •  ,XK  £ ) 

and  let  n£  or  n  denote  control  policies  (i.e.,  rules  for  determining  the  vl,€, 
vl£,  v'j£,  v‘j'£),  and  let  E^  denote  the  expectation,  given  policy  n  and 

initial  condition  X£  «  x.  Let  P  denote  the  vector  of  indicator  functions 

(P“)  of  the  processors  and  links.  In  general,  the  value  function  depends  on 
the  initial  value  of  P  (although  we  set  (w.l.o.g.)  the  initial  values  P®  =  1 

until  Section  5).  Then,  for  a  bounded  and  continuous  k(  )  and  0  >  0,  our 
cost  will  be  of  the  discounted  form  (2.6). 


f* 

V£(n,x,/>)  *  E”  c‘0t  k(X£(t))dt 
•'o 

n  K  -0vi£ 

+  E"  [  kj  [  e  n 


+  E”  I  I  k(J 

i=0  j  =  l 


The  first  term  in  (2.6)  is  the  holding  cost.  The  next  two  are  the  costs  for  the 

impulsive  switching,  and  the  last  the  cost  of  lost  output  via  either 

non-admittance  of  customers  or  forcing  them  out  of  the  system  before  the  total 
required  processing  is  completed. 

The  average  cost  per  unit  time  problem  could  be  handled  as  well,  but  is 
somewhat  more  complicated.  See,  for  example  the  average  cost  per  unit  time 
problems  in  [6],  [8],  for  other  models. 

We  now  specialize  to  the  case  of  Figure  2.1.  We  specialize  since  it  is 

awkward  to  keep  track  of  the  effects  of  the  controls  in  a  network  with  general 
feedback  allowed,  particularly  of  the  effects  of  empty  queues  which  are  (at 
least  partly)  due  to  the  control  actions.  With  mainly  notational  changes,  the 
case  dealt  here  with  can  be  extended  to  the  general  case  where  the  only  allowed 
feedback  in  the  system  is  from  the  output  of  a  processor  to  its  own  input  - 
otherwise  the  flow  is  ’forward’. 

Refer  to  Figure  2.1,  and  assume  (A2.1).  The  first  part  of  this  assumption 
(or  restriction  on  the  control  actions)  says  simply  that  if  a  queue  is  empty,  then 
we  won’t  continue  to  ’starve’  it  -  but  will  turn  on  all  the  inputs.  The 

assumption  seems  to  be  quite  unrestrictive,  and  it  does  simplify  the  bookkeeping 
quite  a  bit 

A2.1.  If  X2,€  =  0,  then  all  inputs  to  P}  are  open:  i.e.,  P2,€  =  P22,€  = 
P°2,€  *  1.  If  X*,€  *  0,  then  the  input  to  Pj  is  open  (i.e.,  P°1,€  =  1).  If  some 
X'n,€  «  Bj,  then  all  inputs  to  P;  are  closed. 

For  the  system  of  Figure  2.1,  and  under  (A2.1),  we  have  that  (2.1)  - 
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(2.5)  take  the  forms  (2.7)  -  (2.9).  Here,  P*,€  =  1,  since  there  is  never  a  need  to 
shut  P2  off.  ((2.7)  is  written  for  easy  reference;  all  the  Y'j,  U1J  are  still 
defined  by  (2.1)  -  (2.2).) 

(2.7)  Y”’€(t)  =  /r  ‘l  *£  ?'n  P“  I[xi=0) 


=  ✓r  z  %  I"  I{X2=0) 


Z12,£(t)  =  /r  Z  I*2  (1  -  Pi2)  Pi  I 


v1ZOt 

=  u12'€  (t)  -  U12'£(t)  -  Z  f  n  dY12,£ 

i  Jv22Ot 


The  Y12,£()  will  converge  to  a  continuous  function  and  v^2,6  -  v^2-6  *-*  0. 
Thus  the  last  term  on  the  right  of  the  last  equation  will  disappear  in  the 
limit.  Define  U1>£()  =  U10-£()  +  U12-£().  Then 


(2.8a) 


(2.8b) 


X1,€(t)  =  A1,£(t)  -  D10,€(t)  -  D12,£(t) 

+  Yl0,£(t)  +  Y12,£(t)  -  U01>£(t)  +  Ull£(t) 

X2-£(t)  -  A2,£(t)  -  D20,£(t)  +  D12,£(t) 

+  Y20,£(t)  -  Ye12-£(t)  -  U02l£(t)  -  Uj2'£(t) 

**» 

V£(7I,X,P)  =  E”  e-s‘k(X£(t))dt 

+  kj  Z  e-evn£  +  Z  k0j  Ejj  Z  e-Bvn'€ 

n  i=l  n 

+  k12  E^  Z  e'Bvn2,< 

n 

+  E?  f  e'Bt  [  Z  qoidU0i-£(t)  +  q12  dZI2'£(t)]. 

'0  i=l 


l 


*>  it,  at.  .1 


i<.  »l,il.'  i'.'iVlWlkJkJ**  j».  •  ||  *  '.L  it.  I*»  iU'iL1  ll.  jt.'ll.'l*.  it.  jt,  *  it. 


We  now  give  some  more  definitions  and  state  the  heavy  traffic 

assumptions.  It  will  sometimes  be  convenient  to  write  the  multiple  sequence 

v4  =  {v^-4,  vj^4,  v^-4,  v^j'4}  as  a  single  sequence.  Let  {T4}  denote  the 

sequence  of  event  times  indicated  by  all  the  elements  of  v4  in  order  of 

increasing  time,  but  without  respect  to  which  events  they  indicate,  or  whether 

they  indicate  multiple  events.  Define  R4  *  (R^*€ ,  R°1,4,  R°2,£,  R*2,4),  w^ere 

R“'4  =  1,-1  or  0  depending  on  whether  or  not  the  ’control’  with  the  same 

superscript  was  opened  (turned  on),  closed  (turned  off)  or  left  unchanged  at  T4. 

From  (R^,T4),  we  can  recover  all  the  control  actions  and  their  times. 
n  n 

Let  S1,4  =  E  o’-'4,  SVf  =  £  A1.’4.  Let  E1'4  denote  the  expectation 

a,n  j  a,n  j  a,n  r 

j=i  i 

given  the  arrival,  departure  and  control  intervals  and  actions  which  ended  by 

real  time  S*a'4,  as  well  as  the  lengths  of  all  other  arrival  and  service  intervals 

(other  than  aj^4)  which  started  by  but  which  might  not  have  been  completed  by 
time  Sa'4.  Analogously,  that  E^'4  denote  the  expectation  given  the  arrival, 
departure  and  control  intervals  and  actions  which  ended  by  real  time  Sj’4,  as 
well  as  the  lengths  of  all  other  arrival  and  service  intervals  (other  than  Aj^4) 
which  started  by  SVf-  Define  the  conditional  variances  var'4  varV4 
analogously.  Define 


t 


H 


t 


pi,€ 

Ea,n  n+1 


vari:,n  <+1  -  (<n+l)2 


P>.€  A>.f 

^d.n  Gn  +  1 


r>,«  a>.€ 

rd,n“n+l 


Henceforth  when  we  say  that  Pjt  P0i  or  Pir  resp.,  is  open  (closed)  at  time 
n,  we  mean  that  processor  i  is  working,  the  link  from  the  exterior  to  P;  is 
open  or  (resp.).  the  link  from  Pj  to  ?2  is  open  for  traffic. 

We  will  use 
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A2.2.  There  are  positive  numbers  gsi  and  gdi  and  bounded  continuous  functions 
a‘(  )  and  d‘(  )  such  that 


g„  +  VT  ain  +  o(/r). 


-  8di  +  vr  din  +  o(/r), 
where  ain  =  a^X*  >€)  and  din  -  d*  (Xe,  €) 

a«,n  Bd,n 

Comment  on  (A2.2).  We  allow  the  (marginal)  external  inter-arrival 
intervals  and  the  service  intervals  to  depend  on  the  system  state.  The  argument 
Xgi  £  (for  example)  is  the  proper  one,  since  S^,£  is  the  (real)  starting  time 

a,n 

for  the  (n+1)’1  (external)  inter-arrival  interval  to  Pj  (the  moment  of  arrival  to 

Pj  of  the  n+1*1  customer  from  the  outside),  and  X£ie  is  the  system  state  at 

s*’,n 

that  time.  We  could  let  the  marginal  mean  rates  a‘()  and  d‘()  be 

controlled.  We  then  use  a‘(X‘;<,  r‘|£  ),  etc.  Here  the  rg  is  the  control  over 

S>,«  g>.  *  » 

m,n  »,n 

the  mean  marginal  rate.  There  is  no  problem  in  incorporating  controlled  rates 
into  the  weak  convergence  and  approximation  results  of  Section  5.  An 

appropriate  associated  cost  would  include  a  direct  cost  (higher  for  higher  rates) 
and  an  indirect  cost  due  to  the  possible  gain  in  production  due  to  the  higher 
(input)  rates.  Similarly,  the  g^  can  be  controlled  or  even  state  dependent, 
provided  only  that  the  heavy  traffic  assumption  (A2.4)  below  continues  to  hold. 

A2.3.  The  set  {|a‘n,€|*i  |  1 2>  i,  n  <  ",  small  t,  all  control  actions}  is 

uniformly  integrable. 


-vsiassisfiimasititsc^^ 


A2.4.  (Heavy  traffic  assumption) 


=  (1  *  Pii)Bdi 

[PljSdl  +  8^2]/ ( 1  “P22)  *  Sd2‘ 


(A2.4)  is  also  what  one  would  get  from  Reiman’s  [1]  formulas  for  the  case 
of  Figure  2.1.  If  either  condition  in  (A2.4)  is  violated,  then  either  some 
buffer  will  always  be  full  as  €  -  0  (and  the  cost  will  go  to  •)  or  else  some 
X1,£(t)  -  0  as  €  -  0  (and  the  cost  will  go  to  ■*).  With  little  extra  trouble 
it  is  possible  to  control  the  also  -  but  this  seems  to  be  of  not  much 

interest  for  the  case  of  Figure  2.1.  The  results  for  our  case  can  readily  be 
extended  to  the  case  of  ’feedforward’  systems,  where  the  only  allowed  feedback 
in  the  routing  is  from  a  processor  to  itself.  For  these  general  cases,  it  might 
be  worth  controlling  (marginally)  the  p(j.  The  extension  is  simple,  and  follows 
the  same  lines  as  would  the  extension  to  marginally  controlled  rates. 


A2.5.  The  routing  variables  {I|^'€,i,j,k}  are  mutually  independent  and  inde pendent 
of  the  {alk,<,Aji,t}  and  P{I^'€  *  1)  -  p^. 


A2.6.  There  are  continuous  functions  0^  ■ ),  odi(  )  such  that 


<€„+t 


Vxi.* 


*.n 


)  +  6 


1 

€ 


Vxsr.< 

"d,n 


)  +  6 


( ' 


where  6®  *-►  0,  uniformly  in  all  other  variables. 


Comment  on  (A2.5)  and  (A2.6).  We  allow  the  conditional  variance  to 


depend  on  the  state  here,  just  to  show  the  possibilities.  Controlled  variances 
can  also  be  handled.  In  many  applications  (and  in  most  past  works  on  the 
heavy  traffic  model)  the  oaj  are  just  constants.  The  independence  in  (A2.5) 
can  also  be  weakened,  and  the  sequence  of  interarrival  times  or  service  intervals 
can  be  correlated  (in  ways  other  than  via  the  ’state’  dependence  used  here).  This 
would  involve  a  more  complex  method  for  obtaining  the  weak  convergence.  The 
perturbed  test  function  methods  of  [5]  (see  also  [6])  are  quite  suitable  for  that 
task,  and  would  require  only  moderate  changes  in  the  proof  of  Theorem  5.1,  but 
the  additional  notational,  etc,  burden  seems  hardly  worth  it  now. 
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3.  A  Convenient  Representation  for  X£(). 


In  this  section,  we  center  and  rewrite  the  terms  of  ("8),  so  as  to  facilitate 

the  weak  convergence  analysis  in  Section  5.  We  will  do  three  things.  First, 

the  A  and  D  processes  will  be  centered,  the  centering  terms  simplified,  and  the 
centered  processes  written  as  a  rescaling  of  simpler  processes.  This  is  similar 
to  the  procedure  of  [1].  Then  we  will  represent  the  Y*j,£  and  U‘^,£  in 

terms  of  simpler  processes  Yl,£  and  U‘,£  (not  depending  on  j)  plus  a  term 

which  will  go  to  zero  as  t  *♦  0.  Finally,  we  will  represent  Yi,£  and  Xi,£ 

as  continuous  (and  unique)  functions  of  the  ’other’  data,  similar  to  the 

representation  used  in  [1], 

Centering  of  the  Arrival  and  Departure  Processes.  Now,  several  processes 
will  be  defined.  Define  S^’ £(t)  (and  analogously  SJj,£(t))  to  be  the  inverse  of 
the  interpolated  arrival  time  function  eSJ,’f in  the  sense  that 

S^t)  «  max  {€  k:  «  <  t). 
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The  second  equality  in  the  first  definition  follows  from  the  fact  that 
=  1  only  at  the  left  endpoint  in  the  interval  and  the  length  of 

the  interval  is  (and  similarly  for  the  second  definition). 


Owing  to  the  independence  assumptions  in  (A2.5),  we  can  (and  will. 

henceforth)  replace  the  I'f  by  I!’'.  We  can  write  A1,€(  )  in  the  form  (which 

_  sd,k 

defines  Al,€()  and  B‘&,€(-)) 


(3.2) 


Ai,€(t) 


(t)  s‘4+rl 

«k=i  l=S^£k 


S1'*  (t) 

+  vr  L  o^k/o?k 
(  k  =  l 


z  A{j«  (S|;c  (t))  +  B'^(t)  =  Ai,€(t)  +  B';€  (t). 

Doing  the  same  thing  for  the  D‘-i,f(  ),  we  have  (which  defines  D’j'€(  )  and 
B‘^€() 

(3.3)  Dij’€(t)  =  qjj-€(S|i€(t))  +  Bjj'€(t)  s  D«-C(t)  +  BjJ^t) 
where 

(3.4)  B{jJ,€(t)  =  vr  t  g-py. 

£k=l  **k 

For  purposes  of  calculation  below,  write 

Dj0,t( t)  +  $2'£(t)  -  /r  *1  [(1  -  IJV  (1  -  p„)  £f-] 

1  Ak 


We  now  cancel  the  ’principal  parts’  of  the  B'a£  terms.  By  taking  the 
terms  in  the  order  in  which  they  would  appear  in  the  centering  of  the  first 
three  terms  of(2.8a)  and  using  the  expansion  in  (A2.2),  we  write 
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B^€(t)  -  ( B‘°’€  (t)  +  B^'e  (t)) 


si'£w 


Since 


si'£(t) 


5:  oj  [g4l  +  /T  aik  +  O(VT)] 

£  k=  1 

si,£(t) 

-  VT  t  aJ  [gal  +  /r  dlk  +  o(vr)](l  -  pn). 


t/c  (mod.  0(1)),  the  principal  term  of  the  first  sum  is 


Sal*/ €  (mod  0(/r)),  and  of  the  second  in  g^t/^  (mod  O(v'r)).  These  cancel  by 
(A2.4).  By  using  the  definitions  of  ak  and  alk  and  the  fact  that  Xk 
changes  by  at  most  0(0  per  step,  we  can  write  the  sum  of  the  middle  terms 
in  the  first  sum  of  (3.5)  as 


«  E  a*(Xk)  +  (term  which  -»  0  as  €  -*  0) 


and  similarly  for  the  analogous  terms  in  the  second  sum. 

With  the  above  cancellations  and  the  last  representation,  we  can  rewrite 

(3.5)  as  (3.6)  (where  8'£()  0,  uniformly  on  bounded  intervals).  Equation 

(3.6)  defines  B1,£(  )  and  b*(). 

t/« 

(3.6)  €  E  [a1^)-  (1  -  pu)d,(Xk)]+  6*(t) 


h  f  b1(X£(s))ds  +  6»(t)  =  B1,€(t)  +  6*(t). 

•  n 


Repeating  the  procedure  for  the  ’biases’  arising  from  (2.8b),  we  get  (which 
defines  B,,£()  and  bJ()) 


V ’mV V •,,.i \0,c 


ft 

si 
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§2,€(t)  .  i20,€  (t)  +  B».«(t) 

-  B^t)  +  6|(t)  -  f  b2(X£(s))ds  +  6|(t) 

•'o 

t/€ 

-  €  E  [a*(Xk)-  (1  -  p2J)d2(Xk)  +  Pi2d1(Xk)]  +  s|(t) 

1 

-  [  b2(X£(s))ds  +  62(t)  =  B2,€(t)  +  5|(t). 

■’o 

A  Representation  for  Ul2,c,Y^,£.  Define  the  processes  (with  P2  €  =  1) 


Y1'  (t)  -  vr  E  <  Pj,  I{^  .  o} 


We  can  also  write 


(3.9a) 


oo 

u12'£(t)  = 

E 

f  dU£ 

n=l 

Jv^rv 

00 

,v12,€ 

Yc12-£(t)  - 

E 

n=0 

f  n+1 

Pl2 

1  i  yl2,€ 
n 

It  will  turn  out  (Section  5)  that  the  limits  in  (3.9b)  hold 
Y1j'€(  )  •  p1J,1>€(  )  *  0 
Y20,€(  )  -  (1  -  p22)Y2-£()  ♦  0 


Y22,€(t)  -  I  pJ2  f  n+1  dY1,£(s)  ♦  0 


n=0  V’t* 


(3.9b) 


Y”'£(-)  -  P12Y1'€(  )  ♦  0 

Ulj>€(  )  -  U1,£(  )Pjj/(l  -  pn)  *  0,  j  -  0,2 

t/« 

Uc12<(t)  -  vrp„  E  *‘(1  -  P*P22)  ♦  0 
o 

Z1J-€(  )  -  [U”£(  )  •  U12'£()]  ♦  0 
v®'€  -  v®>£  -•  0,  each  ot,n. 


XS/'  * 'mjr' iLfmJ'**-  <T>  '  , 


In  order  to  prepare  for  the  utilization  of  these  convergences  and  simplifications. 

.  .  A  , 

rewrite  (2.8)  and  (2.9)  as  follows,  where  the  p'1  (•)  and  p1,  are  ’small  error’ 
processes  and  the  W1,£()  are  defined  to  be  the  sum  of  the  first  three  terms 
in  the  middle  part  of  (3.10a)  and  (3.10b),  resp. 

XI>€(t)  *=  AM(t)  -  +  E)12'^))  +  B1£(t) 

(3.10a)  +  (Y10€(t)  +  Y 12,£(t))  -  U01'£(t)  +  l/  €(t)  +  pM(t) 

=  W^t)  +  B1,c(t)  +  ( I  •P11)Y1,£(t) 

-  U01'£(t)  +  Uli£(t)  +  pl  f  (t) 

(3.10b)  XJ'£(t)  =  A2,£ (t)  -  D?°'£(t)  +  D>2'£(t)  +  B2,£(t) 

+  Y20,£(t)  -  Y”'£(t)  -  U02'£(t)  -  Uc12'£(t)  +  p2£(t) 

-  W2,£(t)  +  B2,£(t)  +  (l-p22)Y2  £(t)  -  P12Y1,£  (t) 

-  U02'£(t)  -  Uc12£(t)  +  p2'£(t) 

(3.11)  V£(n,x)  *  [eqn  (2.9)  with  Z12£(  )  replaced  by  Uc12'£(  )  -  U12'£(  ) 

and  an  ’error'  term  ps,£(  )  added]. 

It  will  turn  out  (Section  5)  that,  for  any  sequence  of  controls  n€  with 

sup  V£(rt£,x)  <  «,  sup  |  pl,€ ( t)  |  -  0  in  distribution  for  any  T  <  “,  and 

t<T  1  1 

similarly  for  the  p1,£(  ). 

Owing  to  the  impulsive  nature  of  the  ’control’  part  of  the  cost  (2.9),  on 

any  bounded  time  interval  there  are  only  a  finite  number  (w.p.l)  of  subintervals 
on  which  the  controls  are  active  (i.e.,  where  some  P;  or  is  shut  off).  By 

the  definitions,  the  reflection  terms  Ylj,£(  )  cannot  increase  on  these  ’control 

intervals’.  In  particular,  Y1,£(  )  (and  Yll,£(  ))  can  only  increase  wh’i  ‘h 
P01  and  Pj  are  on  (recall  that  P01  is  on  when  X1  =  0).  Also,  Y2£(  )  tand 


*  k*  ww  v»  w  HFUKiffTw  u  gnonpn 
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Y20,£())  can  increase  only  when  all  of  Pp  P12  and  P02  are  on  (by  (A2.1),  if 
X2,£  =  0,  then  all  inputs  must  be  turned  on).  Because  of  this,  the  setup  of  [1, 
Lemma  1]  can  be  used  to  obtain  the  ’reflection’  terms  as  continuous  functions 
of  the  other  ’non-control’  data,  simply  by  using  the  representation  of  [1]  on  the 
appropriate  ’non-control’  time  segments,  and  we  now  formalize  this. 

Let  J3  £  =  [M3,£,M3,£)  denote  the  sequence  of  successive  intervals  (of 
interpolated  time)  such  that  Pj,£  -  P^1|£  =  1  for  ck  €  J3,£,  and  let  J2,£  = 
[^n’£,^n,£ )  denote  ,he  successive  intervals  such  that  Pj,£  =  P32,£  =  P£2,£  =  1 
for  ek  €  J2  £.  The  Y‘|£(  )  can  increase  only  on  the  J^,£. 

We  can  use  the  representation  for  the  Y‘-',£  of  [1]  in  the  pieces  between 
the  control  intervals.  For  any  function  f(  )  define  f.  (•)  =  f((/i*,£  +  ■)  O 
5‘n,€))  -  f(M'n,£).  By  (1,  Lemma  1],  there  is  a  unique  continuous  function 
F(  )  =  (F3(  ■  ),F2(  • ))  (the  continuity  in  the  arguments  which  are  functions  is 
taken  to  be  continuity  in  the  topology  of  uniform  convergence  on  bounded  time 
intervals)  such  that 


(3.12)  Y30^  +  Y|2n£  =  F](X3'£(^£),  W3;£(),  Bj,£  ( •),  pj;£(  )) 

Y20^  =  F2(X2'£(m2’£),  W2'£(-),  B2'£(  ),  Y )  +  Y 22^£  ( - ),  p2’£(  )). 

'  ...  ’  2,n 


Furthermore  F()  is  ’non-anticipative’,  the  corresponding  Xl,€() 


is 


non-negative  and  the  (resp.)  left  hand  sides  of  (3.12)  can  increase  only  at  those 
times  when  the  (resp.)  XJ),£(  •)  are  zero. 

Alternatively,  there  is  a  unique  continuous  function  F(  )  such  that 
(3.13)  (Y10,£(  •)  +  Y12'£(  ),  Yj0'£(-))  = 

F(W£(  • ),  B£(  • ),  p£(  ),  Xq,  Xi,£((i|(f),  i  =  1,2,  n  <  -) 


(3.13) 
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where  F(  )  has  the  properties  ascribed  to  F(  )  above.  In  particular,  the 

value  of  the  left  side  of  (3.13)  at  time  t  depends  only  on  the  arguments  of 

the  functions  in  F  at  times  <  t  and  on  the  u'n’(,  with  values  less 

than  or  equal  to  t.  Owing  to  (3.13),  we  will  not  need  to  concern  ourselves 

with  the  weak  convergence  of  the  arguments  of  the  Yl-i,€(  ).  This  will  follow 

from  the  weak  convergence  of  the  arguments  of  F(). 

A  Tentative  Form  for  the  Limit  Control  Problem.  Purely  formally,  let  the 
arguments  of  F(  )  converge  to  W(  ),  B(),  u'n,  Ji'n,  (p(  )  =  0)  and  let  Y*(  ) 
be  the  limit  of  Y1,€().  Then,  on  each  bounded  time  interval  the  complement 
of  {[p^Ji^n  <  ®)  will  just  be  a  finite  set  of  points,  and  the  controls  will  be 
impulses  acting  at  these  points.  Using  this  assumed  convergence  and  (3.9b)  we 


will  have 


(3.14) 


x\l)  =  x‘(0)  +  W*(t)  +  BJ(t)  +  (1  -  p^Y^t)  -  U01(t)  +  U!(t) 
X2(t)  =  X2(0)  +  W2(t)  +  B2(t)  +  (1  -  p22)Y2(t)  -  PjjYHt) 

-  U02(t)  -  Uc12(t). 


The  Y^2(  • )  can  be  obtained  from  the  limit.  YJ(  )  via  (3.9).  The  limits 
(1  -  P11)Y1(  • )  =  lim(Y10,€(  • )  +  Y12'€(  ))  and  (1  -  p22)Y2(  )  =  lim  Y20'f(  )  are 
to  be  obtained  from  the  limit  of  (3.13).  Furthermore,  (as  in  [1])  the  (l-p^Y'U) 
obtained  from  the  limits  in  (3.13)  are  the  unique  continuous  functions  which 
can  increase  only  when  X‘(t)  is  zero  and  which  guarantee  that  X'(t)  >  0. 

The  Uc12(  •)  can  be  used  to  define  U12(  )  via  limits  in  (3.9).  We  will 
have  U,2()  =  PjjUVVO-Pjj). 
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4.  Description  of  the  Limit  Control  Problem 

In  this  section,  we  define  the  proper  limit  control  problem  for  the  system 
of  Figure  2.1.  First,  it  will  be  convenient  to  picture  the  effects  of  various 
control  actions  on  the  X£()  for  small  t.  We  do  this  in  some  detail,  since 

the  limit  problem  is  somewhat  non-standard,  partly  owing  to  the  possibility  of 

’multiple  simultaneous  impulses’.  Also,  the  set  of  admissible  impulses  and 
associated  costs  are  defined  via  the  possible  limits  of  the  controlled  X£(  ). 
associated  with  bounded  costs. 

Given  the  limit  controlled  reflected  diffusion  X(  ),  we  will  need  to 

determine  an  optimal  or  6-optima!  policy  for  it.  In  order  for  the  'limit' 
problem  to  make  sense,  for  any  admissible  policy  n  for  the  limit  X(  ),  there 
must  be  a  sequence  n£  of  policies  which  can  be  applied  to  the  X£(  )  (i.e., 
Pij,Pj  on/off  or  rate  controls)  and  such  that,  under  ne,  X£()  converges  to 
X(  • )  (with  policy  n),  and  the  associated  costs  also  converge.  Because  of  this, 
the  limit  control  problem  must  be  defined  in  terms  of  limits  of  what  is  possible 
for  the  X£(  ■).  This  yields  a  rather  interesting  limit  control  problem. 

Controls  for  the  Limit  Problem.  Refer  to  Figure  4.1,  where  some  typical 
paths  are  constructed,  under  the  heavy  traffic  conditions.  Start  at  point  (a) 

with  all  Pj,  Pjj  on  except  that  P0l  is  off.  The  path  moves  to  the  left  and 
as  €  -  0,  it  converges  to  the  horizontal  line  (a,b).  The  mean  (interpolated) 
movement  to  the  left  in  time  A  is  gaiA//r  +  0(A).  Hence  in  the  limit,  as 
«  -•  0,  there  is  an  impulsive  change. 

Now,  restart  at  (d)  with  only  P12  off.  The  path  drops,  and  as  <  -  0 
it  tends  to  the  vertical  line  (d,e).  In  time  A,  the  mean  drop  is 


I 


p12gdlA/vr  +  0(A).  The  same  path  is  followed  if  only  P02  is  off  or  if  Pj 
and  PQ1  are  both  off,  although  the  ’drop’  speed  will  be  different.  Now, 
restart  at  (e)  with  only  Pj  off.  The  path  moves  toward  (f)  (for  small  €), 
and  the  limit  slope  can  be  calculated  from 


(4.1) 


net  mean  flow  into  P2  ga2  -  (1-P22)gd2 
■  ■  -  =  - 

net  mean  flow  into  Pj  g4l 


P12gdl 

gal 


If  the  path  reaches  (f),  then  P01  must  be  turned  off.  If,  at  (g),  we  turn  Pj 
back  on  (but  leave  P01  off),  then  the  path  moves  toward  (h).  The  effects  of 
both  Pj  and  P12  being  off  simultaneously  are  the  same  as  for  Pj  being 
off  alone.  Over  small  intervals  of  length  A,  the  A,  D  and  Y  terms  in 
(3.10)  contribute  very  little  to  the  paths  (compared  to  the  effects  of  the  control 
actions),  since  they  converge  weakly  to  continuous  functions. 

Now  refer  to  (i),  and  let  only  P01  and  P02  be  off.  Then  the  path 
moves  to  (j)  with  a  limit  slope  calculated  as  in  (4.1)  and  yielding  the  slope 


(4.2)  [(1  *  P22)gd2  "  Pl2gdl^(^  ’  Pll^gdl 

Similarly,  if  only  P01  and  P12  are  off  at  (i),  then  the  path  moves  toward  (j) 
with  a  limit  slope 

(4.3)  [(1  -  P22)gd2  *  g,2]/(l  *  Pn)8di 

All  finite  sequences  of  arbitrary  lengths  of  the  impulses  described  in 
connection  with  Figure  4.1  are  possible.  Suppose  (e)  ■*  (f)  *•  (g)  *•  (h).  Then 
as  €  -»  0,  it  would  appear  that  the  limit  X(-)  jumps  from  (e)  to  (h)  directly. 
But  this  (e)  -•  (h)  impulse  must  be  realized  as  a  concatenation  of  the  basic 


3 


r 

( 

fl 
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impulscs  described  above.  In  general  the  limit  control  is  specified  by  a 

sequence  of  off/on  actions  for  the  P;,  Pjj,  in  a  specified  order ,  and  with  the 
impulsive  distance  travelled  between  successive  (’simultaneous’)  control  actions 
specified.  The  cost  paid  for  the  impulses  is  precisely  the  impulsive  costs 
defined  by  (2.9).  The  described  limitation  on  the  ways  in  which  the  impulses 
for  X(  )  can  be  created  is  important,  if  the  control  problem  for  the  limit 
X(  )  is  to  be  properly  related  to  that  for  X€().  In  Section  6,  we  show  that 
the  problem  can  be  quite  tractable  from  a  numerical  point  of  view. 

The  instantaneous  changes  in  the  Ua()  can  be  readily  read  off  from 
the  limit  sequences  of  simultaneous  impulses.  For  illustration,  we  do  it  for  the 
(e,f,g,h)  sequence  of  Figure  4.1.  Let  ei5  etc.  denote  the  ith  coordinate  of  the 
point  (e),  and  let  6Ua  denote  the  increment  in  Ua.  On  (e,f),  5U10  +  6U12  = 
fi  ‘  ci*  8Uc2  =  e2  '  fr  0n  6lj01  =  6Ul°  +  &U12,  and  the  value  is 

unimportant,  since  their  eCfects  cancel  in  (2.8a).  Also,  6UX2  *  f2  *  gr  On 

(g,h),  6U01  =  gj  -  hr  All  non-specified  6Ua  are  zero.  The  6UXl  always 
occur  as  (6U10  +  6U12). 

The  Limit  Dynamical  System.  The  Wiener  Process.  The  limit  system  will 
be  (3.14).  It  will  turn  out  that  the  limit  W'()  can  be  decomposed  as 
follows  (using  the  limits  of  the  three  terms  in  (3.10)  which  are  used  to  define 
the  Wi|€()). 

WV)  =  AV)  +  wx(  ),  wx(  )  =  -B*0( •)  -  d>2(  ) 

W2(  • )  =  A2(.)  +  W2(  ),  W2(.)  =  -D?°(.)  +  &*(•). 

Here,  all  the  terms  are  continuous  martingales,  with  Ax(  ),  A2(  ),  D20(  )  and 
(D10(  •  ),D12(  • ))  being  mutually  orthogonal.  The  quadratic  variation  of  A'(  ) 


>S  jo*L  °ai(X(S))ds  and  that  Of  Wd(>  *  (Wd()-  Wd(»  i$  «*)  “  ffyOM. 
where 


(4.4) 


EnW  "  SdifPi^1 
^12^)  “  "8dlPl2^ 


Pn)t  +  8diO  -  Pn)*f  od2j  (X(s))ds] 
■'o 

Pn)  [  <&(x(s))ds  -  PuPn^di1 
Jo 


EniO  =  gdjlPao^1  *  Pjo)1  +  PsoSL  f  <4(X(s))ds] 

Jo 

+  gdilP^1  -  Pl2)1  +  Pl28dl  f  o*i(x(s))ds]. 

Jo 

If  the  odj  and  o2(  are  constants,  then  the  covariance  is  precisely  that 
obtained  by  Reiman  [1]  (with  a  different  notation  used  there). 

It  is  evident  from  (4.4)  and  the  cited  orthogonality  properties  that  there 
are  mutually  independent  Wiener  processes  w’a(),  wd(),  wd°(  •  ),  (w“(),  wd2()}, 
where  each  scalar  valued  process  is  standard,  and  with  respect  to  which  X(  ) 
is  non-anticipative  and  Ew21(t)w22(t)  -  -[pnPu/(I-Pn)  (l-p12))^t  and 

A‘(t)  «=  gsJ2  f  (X(s))dwjk(s) 

‘'c 

wd(‘)  -  [8diPu(»  *  Pu>]M1(t)+  O  -  Pn)8dt2  f  °di(x(s))dw^(s) 

Jo 

(4-5)  Wd<1)  *  t8d2P2o(1  *  P2o>l}iwd0(t)  +  ^diP12(1  -  Pi2>]^wd2(t)  + 

+  P20«d22  f  <7d2(X(S))dwd(S) 

Jo 

■  P  1*8*1*  f  °dl(X(s))dwl(s). 

Jo 


The  terms  involving  w^(  )  are  due  to  the  variations  in  the  routing. 


whereas  the  terms  involving  wj( • )  are  due  to  variations  in  the  service  times. 

The  drift  terms  B‘(  )  in  (3.14)  came  from  (3.6)  and  (3.7)  and  are 

tf(t)  =  f  [aJ(X(s))  -  (1  -  pn)d1(X(s))]ds 
*'o 

B2(t)  =  f  (a2(X(s))  -  (1  -  p22)d2(X(s))  +  p12d1(X(s)))ds. 

•’o 

Then  the  limit  problem  is  defined  by  (3.14). 

Admissible  Control  Actions.  The  U®  and  U“  in  (3.14)  are 
non-decreasing  piecewise  constant  functions  which  have  only  a  finite  number  of 
jumps  on  each  finite  interval,  and  they  can  be  taken  to  be  right  continuous. 
They  thus  correspond  to  ’impulsive’  controls.  We  first  identify  the  allowed 
control  impulses  in  the  limit  model  (3.14)  with  those  described  above  for  the 
discrete  model  (2.8).  The  allowed  impulsive  effects  of  U1  in  (3.14)  are  those 
described  for  U1,c  in  (2.8),  as  e  -  0.  Also  the  impulsive  effects  of  U*2  are 
the  limits  of  those  of  U*2,€,  and  the  effects  of  the  U°‘  are  those  of  the 
U0l,€  as  i  -  0.  This  completely  characterizes  the  possibilities  for  the  impulse 
control  of  (3.14).  Generally,  several  components  of  the  controls  might  jump 
simultaneously,  or  a  single  jump  in  one  component  might  be  a  consequence  of  a 
multiple  simultaneous  off/on  sequence.  We  must  allow  these  possibilites  and 
distinguish  an  order  for  the  ’simultaneity’,  as  discussed  above,  not  only  because 
they  are  possible  control  actions,  but  because  they  are  possible  limits  of  control 
actions  for  the  physical  processes.  Thus,  we  count  the  parts  of  the  multiple 
simultaneous  impulses  as  distinct  impulses.  We  now  develop  the  notation  for 
keeping  track  of  the  necessary  information.  Recall  the  definitions  of  T*  and 
R*  given  below  (2.9). 

Let  Tn  denote  the  sequence  of  event  times.  The  Tn  are  not  necessarily 
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distinct,  but  Tn+1  *  Tn  and  the  subscript  n  denotes  the  correct  ordering, 

’simultaneous’  or  not.  At  each  event  time  one  or  more  of  P;  or  Pt-  might 
shut  off  or  on.  What  happens  is  indicated  by  the  vector  Rn 

(R^.R^R^.R*2),  where  R^  *=  1,  -1  or  0  (resp.,  R*)  according  to  whether  or 
not  Pjj  (resp.,  Pj)  is  turned  on,  off  or  not  changed  at  rn.  Associated  with 

(Tn,Rn)  is  6Un  =  (6US1i6U2!,^n’^t!n)’  l^e  instantaneous  (at  Tn)  change  in  the 
controls  U(  •).  To  illustrate  the  procedure  refer  to  the  path  (e,f,g,h)  in  Figure 
4.1.  There  are  four  event  times,  Tj  associated  with  (e)  t2  with  (f),  etc.  Also 
Ti  “  T2=  t s  =  T4-  At  Tj.  R1  -  1.  At  T2,  R01  «=  1.  At  Ts,  R1  =  -1  and  at 

T4,  R01  =  -I.  All  non  listed  Ra  are  zero.  The  associated  impulses  8Un  are 

given  in  the  discussion  below  (4.3). 

The  {6Un,Tn,Rn}  is  said  to  be  a  control  policy.  The  policy  is  said  to 

be  admissible  if  the  function 

A 

(4.7)  3?(t)  «  {X0,  6l4,I{Tn<t}i  RnI(Tn<t}’  1{Tn«t)’  n  <  *  X(*),  Y  (t)) 

is  non-anticipative  with  respect  to  the  Wiener  processes  wg(  ).  An  equivalent 
definition  of  admissibility  is  if  the  A',  D'^(-)  are  martingales  with  respect  to 

A  ^ 

the  filtration  generated  by  (R(t),  A‘(  ),  D'^(  )},  with  the  quadratic  variation 
defined  in  and  above  (4.4). 

Given  W(  •),  B(  ),  U(  ),  there  are  unique  processes  X(  )  and  Y(  )  such 
that  Y.(  )  increases  only  when  Xj(t)  ■  0,  and  where  X,(t)  '*  0  and  (3.14) 

holds,  as  in  [1]  (see  the  end  of  Section  3).  Of  course,  here  B(  )  and  w(  ) 
might  depend  on  X(  ),  so  it  is  not  known  a-priori  that  (4.6)  has  a  unique 
solution.  If  the  c^Odi’*5'  not  depend  on  x,  then  the  situation  (without 

controls)  is  like  that  in  (1]  and  we  do  have  uniqueness  of  the  solution  to  (3.14) 


for  each  admissible  contrc'  policy.  The  Y()  in  (3.14)  is  obtained  from  (5.1) 
below,  which  is  in  turn  obtained  by  taking  limits  in  (3.13).  In  (5.1),  is 


the  subset  of  {Tn}  at  which  or  both  Pj  and  P01  are  on,  with  at  least  one 
turned  off  at  Tn  l,  and  {m2}  is  the  subset  of  times  at  which  all  of  PrP12 
and  P02  are  on,  with  at  least  one  being  off  at  Tn  x. 

For  an  admissible  policy,  the  cost  function  (the  limit  of  (2.9))  is 


(4.8) 


V(rr,x,P) 


=  E? 

f  e-Stk(X(t))dt 

+  kjE?  I  e’6vn 

n 

2 

+  X.  k 

o.  Ex  b  e  n 

*  kn  E ?  z  e-* 

t 

n 

r®  o  r 2 

n 

f  ='e‘  f  r  <ioi 

J0  *-i  =  l 

dU°‘(t)  +  q12d[U 

(t)  -  U12(t)] 


In  (4  8)  the  v' J,  v‘  are  defined  as  the  moments  of  shutting  off/on  the 

v  ' 7  n  n 

indicated  links  or  processors,  as  in  Section  2. 
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5.  Weak  Convergence 


We  will  use 

A5.1.  The  uncontrolled  X(  )  has  a  unique  solution  (in  the  weak  sense)  for  each 
initial  condition. 

Note  that  (A5.1)  implies  weak  uniqueness  of  the  solution  X()  for  any 
admissible  control  policy. 

Theorem  5.1.  Assume  (A2.1)  to  (A2.6)  and  (A5.1),  and  let  sup  V€(77€,Xq)  <  00  , 
for  rr€  =  {Rt,Tt,6U*  n  <  “}  admissible.  Then 

{Al,€(  • ),  Aj,€(  •),  (D10,€(  ■ ),  D12,€(  ■ ) ) ,  Djo,€(  • )} 
is  tight  in  D5[0,®)  ( Skorohod  topology)  and  the  limits  of  any  weakly  convergent 
subsequence  of  the  four  sets  (we  pair  D10  and  D12)  are  orthogonal  continuous 
martingales.  On  each  [0,t]  .  the  mean  number  of  control  actions  is  finite,  and  the 
set  of  intervals  on  which  some  control  is  active  converges  to  a  finite  set  of  points. 
The  pieces ^  of  X€(  )  on  the  intervals  where  no  controls  are  active  are  tight,  and 
the  weak  limits  of  these  'pieces’  are  continuous.  The  convergences  (3.9b)  all  hold. 

Let  c  index  a  weakly  convergent  subsequence  of  R€  = 
{Xq.A‘,€( -),D,j*€(  ),Bf(  ),R*,T*,8U*,i,ij,n)  with  limit  denoted  by  R  .  Define  the 
process  R/  )  from  the  limit  processes  by 

X(t)  -  (X0,Ai(t),Dij(t),B(t),i,j,(Rn,Tn,6Un)I{Tn<t},  n  <  -). 


^More  precisely ,  define  th«  ’piece*’  by  shifting  the  stsrt  of  the  intervals  to  the  origin,  and 
continuing  the  ’piece’  to  the  right  of  the  interval  by  setting  its  value  there  to  be  equal  to  the  value 
at  the  right  end  point  of  the  interval. 
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Then  A‘(  )  and  D'j(-)  are  martingales  on  the  filtration  engendered  by  the  K(t), 
with  the  quadratic  variations  given  in  and  above  (4.4).  The  limit  policy 
n  =  {RnTn.6Un}  is  admissible  for  X(  ).  Except  at  points  where  there  is  control 
action.  (3.14)  holds,  where  Y‘(  )  is  defined  by  (5.1).  (See  (3.13)).  We  define 
(w.l.o.g.)  X(t)  by  (3.14)  even  at  points  of  control  action. 

((I  -  P n) Y*( • ) » (1  -  P22)Y2{  ))  = 

(5.1) 

F(W(  ),B(  ),0,X0,Xi(jt‘n)  ,u'n,u'n,  i=l,2,n<®)  . 

In  (5.1)  u'n  is  the  limit  of  both  and  4'n,€  and  X 1(n'n)  is  the  limit  of 

the  values  of  Xl,€(Si^'€)  (the  {m„}  are  obtainable  from  the  {Tn,Rn} ).  The  Y‘(  ) 
increase  only  when  X‘(t)  =  0  .  The  limits  of  the  uncontrolled  sections  of  X€(  ) 
do  not  depend  on  the  the  subsequence,  except  for  their  initial  conditions. 


Proof.  (a)  First,  we  show  the  convergences  (3.9b).  We  do  it  for  U10,£(  )  = 
U10  €(  )  -  P10U1,€(  •)/(!  *  Pu)  only,  for  the  rest  are  treated  in  the  same  way. 


We  have  that 


U10,£(t)  -  /r  £C[Iu°Pl2-  !a2pIQj  (1.Pi) 
i  (Pio  +  P12) 

t/€ 

is  a  martingale  and  its  variance  is  bounded  by  O(r)  E  E  (1-P*)  =  C€(t)  .  It 

l 

is  easily  seen  that 


lim  /r  L  (1-P*)  <  •, 


for  otherwise  the  buffer  of  Pt  will  fill  up  (one  or  more  times),  forcing  the 
P01  to  shut  off  (one  or  more  times)  such  that  EU01,€(t)  will  diverge  and  the 
costs  will  go  to  infinity  as  t  -  0.  Thus,  C€ (t)  — *  0  as  €  — *  0  ,  which  yields 


the  desired  result. 


By  (3.9b),  the  p1,£()  and  p*,£()  of  (3.10),  (3.11)  go  to  zero. 

Below,  the  tightness  of  (W£()}  will  be  shown,  together  with  the  fact 


that  its  limits  are  continuous.  This  and  the  representation  (3.13)  implies  that 
(for  any  weakly  convergent  subsequence)  the  Y‘,<(  )  converge  (in  the  Skorohod 

topology)  to  continuous  processes  Y‘()  •  Thus,  via  (3.9),  we  have 

Yc12()  =  PnYV). 

(b).  We  have  fS^'^  and  S'a,£(t)  converging  weakly  to  the  processes 
(S^(  )  and  S„(-)»  resp.)with  values  t/g^  and  g^t  ,  resp.  This  is  more  or 

less  obvious  since  (e.g.)  c  1  K*  ’  “n*)  has  orthogonal  increments  and  its 

l 

**  •  - 

variance  tends  to  zero  as  £  — *  0  .  The  increments  of  each  A*0*  ( - )  and 
DJ,-1’  (  )  are  also  orthogonal.  Due  to  the  uniform  integrability  in  (A2.3),  those 
processes  are  tight  and  all  weak  limits  are  continuous  martingales. 

The  four  elements  of  (D10  and  D12  are  paired) 

(Aq’£(  ),  A2’£(  ),  (b™’£(),  dJ,2’€(  •)),  Do0|<(  ■))  are  mutually  orthogonal,  and  so 
are  the  weak  limits.  To  see  the  mutual  orthogonality,  one  uses  a  calculation 
of  which  the  following  is  typical.  Take  a  ’typical’  term  from  A^£(  )  and 
Dqj£(  )  and  use  the  definition  of  E„,£  above  (A2.2)  and  the  centering  in 
(3.1)  to  get  (drop  the  c  for  simplicity) 

E[ljj  -PlJ  4/S‘][l  -  </?„] 

-  E[l|j  -  p  E^l  -  dn/«'n) 


+  em  -  i,«  <Si,£  ■  Pu  4/sd 


Using  the  results  in  the  first  part  of  (b)  above,  and  the  definitions  of 
Ai,£(  • )  and  D‘-’,<(  • ),  all  weak  limits  of  A1€(  ),  A2'£(  ),  (D10'€(  ),D12’£(  •)), 
D20'£(  )  are  continuous  martingales.  All  the  assertions  of  the  theorem 
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(except  for  the  non-anticipativity  assertion  and  the  quadratic  variation  values) 
follow  from  the  results  in  part  (a)  and  (b)  above. 

(c)  Owing  to  the  mutual  orthogonality  of  the  four  processes  A1’  (•),  etc., 
and  to  (A2.3),  we  can  calculate  (for  the  limit  process)  the  quadratic  variation 
and  prove  the  martingale  property  with  respect  to  the  o-algebra  engendered  by 
R(  )  separately  for  each  component.  We  do  it  only  for  (D10,£(  ),D12,€ ( • )).  Let 
€  index  a  weakly  convergent  subsequence  of  and  define  R(  )  as  in  the 

theorem  statement.  Let  f(  )  be  a  smooth  function  with  compact  support  and 
h(  ■ )  a  bounded  and  continuous  function,  both  real  valued.  Let  t,t+s  and 
tk<t  below  be  points  such  that  the  probability 
P(Tn  equals  t  or  t  +  s  or  tk)  =  0 
for  each  n,k  .  Define  61 l?n>,€  =  . 

By  the  uniform  integrability  (A2.3),  the  representation  of  D'j,€()  as  a 
sum,  and  a  truncated  Taylor  series  expansion,  we  can  write  (we  can  assume 
w.l.o.g.  that  t£  <  t  for  only  finitely  many  n) 


.  o  o  »  «  .  0.  .  .  .  ^  . 


Now,  use  the  definition  of  Ev€ 

a,n 


given  above  (A2.2),  the  centering  of  8^'* 


and  the  assumption  (A2.6)  on  the  conditional  variances  to  replace  6<^a  in  the 
first  sum  in  (5.2)  by  zero  and  the  in  the  second  by  EVf 

n  n  J  d,n  -1  n  n 

This  latter  quantity  is 


EIf 

^d.k-l 


ii“-p 


la 


■Is 


-  Pj0 


=  Pia6a0  *  PiaPii3  +  PlaPlB  var  AJ  /(AJ)2 


(5.3) 


=  Pia8a0  *  PlaPlB  +  P^PiB  gdi°di  (Xei,€  > 

^d.k-l 


+  (negligible  terms)  . 


The  limit  (as  e  — ♦  0)  of  the  double  sum  in  (5.2)  is 

Sd(t+s) 


1 


0,0=0, 2 


fV0(D,O(T)*Dl,(T))  '  Eo0^dT 


s*(t) 


where 


W')  =  Pio  *  P?o  +  Pio8di°d/X(t/gdp) 
f02^^  =  "  P 10P 12  +  PloPl2*dl°d/^^/8d^ 
^22^)  “  P20  ’  P20  +  P2o8dl°d/^^l/,8dp)  ’ 


where  we  used  (5.3)  and  the  .act  that  cSd'*,t  — *  t/gdl  to  get  the  proper  limit 
of  the  argument  of  odI(  ). 


•5 

»; 

•5 


Now,  recalling  that  S^(t)  =  gdit  ,  and  taking  limits  in  (5.2)  yields 


Eh(X(tk),A‘(tk),D,J(tk),B'(tk),(Rn,Tn,6Un)I{T  <t  ,,  n,k}). 

*■  n  k J 


(5.4) 


f(D10(t+s)  ,D12(t+s))  -  f(D,u(T),D12(t)) 


»12 


nt2, 


(t+s)8di 


1 


_2  Z 
z  a,B=o,2 


fWDl0(T),Dl2(T>)  E«fi(T)dT 


=  0 


tg 


di 


The  arbitrariness  of  h(  )  ,  f(  )  ,  and  t  ,  t+s  ,  (P)  (possibly  excluding  a 

A*  *V 

countable  set)  imply  that  (D10(  •  ),D12(  • ))  is  a  martingale  with  respect  to  the 
asserted  filtration. 

The  quadratic  variation  can  be  obtained  from  (5.4)  via  a  change  of 
variables  and  is  J‘  E(T)dT  ,  where  E()  =  {E^  •  ),a,B  =  0,2)  and 


~  SdlfPlO  *  PlO  +  Pl0°d/X(1^ 

=  8dxf-PtoPi2+  PioP12Sdiad/X(t)>J 

^22^  =  8di(Pjo  P20  +  P20ad2^X(^^  ’ 

With  analogous  calculations  for  D20,£()  and  for  the  A1,£()  ,  we  get 

A*  .  —  .  . 

quadratic  variation  for  the  Wy  ),A'(  •),  D1J(-),as  given  in  Section  4. 

By  the  above  argument  the  limit  policy  {Tn’^n’6^n^  *s  ’non-anticipative’ 
with  respect  to  the  martingales,  or  their  generating  Wiener  processes  wg(  ). 
Owing  to  the  way  they  were  obtained  as  limits  of  the  {Tn>Rn,sl^n}  ’ 
policy  {Tn’^n’6^nJ  's  admissible  in  the  sense  that  it  corresponds  to  admissible 
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sequences  of  impulses  corresponding  to  the  sequence  of  off/on  controls  as 


discussed  in  Section  4. 


By  the  above  argument,  the  limit  policy  {Tn,Rn,SUn}  is  ’non-anticipative’ 
with  respect  to  the  martingales  or  their  generating  Wiener  processes. 

Q.E.D. 


Extension.  Consider  the  graph  of  X£()  (X1,€(  )  plotted  vs.  X2,£(  ))  in 
the  state  space  during  a  fixed  control  action.  It  can  be  shown  that  the  graph 
converges  uniformly  (in  probability)  to  the  limit  straight  lines  given  by  Figure 
4.1,  or  the  considerations  leading  to  it  in  other  cases.  The  convergence  is  in 
the  sense  that  the  maximum  value  of  the  distance  between  any  point  on  (this 
part  of)  the  graph  of  X€()  and  th^  closest  point  on  the  limit  straight  line 
goes  to  zero  in  probability. 


Theorem  5.2.  Assume  (A2.1)  to  (A2.6)  and  (A5.1).  and  let  £  index  a  weakly 
convergent  subsequence  with  limit  R(  )  .  Then  (with  n  defined  as  in  Theorem 
5. 1 )  for  any  P 


lim  V€  (n€,x,T)  *  V(n,\,P), 
£ 


Define  Na,€(t)  to  be  the  number  of  actions  of  the  control  Pa  on  the  interval 


[0,t]  .  If 


(Na,€(n+1)  -  Na,€(n)  ,  a  ,  n  <  ®  ) 


is  uni  fnrmlv  intperahlp  thpn 


Proof.  The  relation  (5.5)  is  just  a  consequence  of  Fatous’  Lemma  and  the 
weak  convergence.  Now,  let  the  uniform  integrability  hold.  Then,  certainly  the 
holding  costs  and  the  impulsive  control  costs  in  (2.9)  converge  to  their  limits,  as 
given  by  the  terms  in  (4.8).  We  need  only  work  with  the  last  integral  in  (2.9). 
The  arguments  for  each  component  are  essentially  the  same,  and  we  work  with 
the  U01,£()  term  only  assuming  that  is  on.  If  Pj  might  also  be  off 

part  of  the  time,  the  argument  is  a  little  more  involved  (involving  the  X2,£  as 
well  as  the  X1,£),  but  is  essentially  the  same. 

When  PQ1  is  off,  the  increments  in  the  Y'j,£(  )  are  zero.  [If  X1£(t) 
=  0,  we  must  have  P01  on,  by  (A 2. 1 )].  We  can  write 

U01-£(t)  =  I  [U01,€(v°1'£nt)  -  U01’£(v°1'£nt)] 

n 

-  r  [W1'£(v°1'£nt)  -  wI'£(v®1«£nt)j 

n 

-  I  [X1,£(v®1,€nt)  -  x1'£(v°1£nt)]  +  E[B1'£(v°1’£ot)  -  B1,£(v£1,£ht)]. 

n  n 

+  (terms  which  -  0  as  t  -  0). 

For  some  Kj  <  •,  the  last  two  sums  on  the  right  are  bounded  by  K1N01,£(t), 
which  is  uniformly  integrable  by  hypothesis.  By  the  orthogonality  properties  of 
the  summands  in  the  expression  for  the  W1,£(  •)  ,  the  mean  square  value  of  the 
middle  term  is  0{t+l)  .  This  yields  the  uniform  integrability  of  (U01,£(t))  for 
each  t  and  of  (U01,£(n+1)  -  U0l,£(n),  c  >  0,  n  <  •).  By  the  weak 
convergence  and  the  uniform  integrability  of  these  and  the  other  terms  in  the 
last  integral  of  (2.9),  the  assertion  (5.7)  follows. 
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It  is  not  a  priori  obvious  that  there  is  a  control  policy  for  which  (5.6)  is 
uniformly  integrablc,  since  we  must  shut  off  the  inputs  to  Pj  whenever  its 
buffer  is  full.  We  will  define  a  standard  ‘comparison’  control  policy  called  the 
hQ-boundary  policy.  It  will  be  useful  since  its  properties  imply  that  we  can 
always  assume  the  uniform  integrability  of  (5,6)  for  the  optimal  or  6-optimal 
policies  for  the  X£()  .  Let  AQ  €  (0,min(B1,B2)/4)  and  refer  to  Figure  5.1.  If 

X2,£  =  B2  then  shut  off  all  inputs  to  P2  until  X2,£  reaches  B2  -  AQ.  Then 

turn  them  back  on.  If  at  the  end  of  that  time  Bt  -  AQ  <  X1,£  «  Bj,  shut  P01 

off  until  X1,£  =  Bx  —  Aq  .  If  X1,€  =  Bj  ,  then  shut  P01  off  until  X1,€ 

reaches  Bj  -  A0  .  Then  turn  PQ1  back  on.  We  use  the  analogous  definition 
I  for  the  A0-boundary  policy  for  X(  )  .  Then,  if  ever  X£()  or  X()  hits 

1  the  outer  boundary,  we  control  it  to  a  distance  at  least  AQ  (in  each 

[  coordinate)  from  the  outer  boundary. 

i  Theorem  5.3.  Assume  (A2.2)  to  (A2.6).  Then  for  the  AQ- boundary  control  and  each 

k  <  « 

(5.8)  sup  Ex|Na’€(n+l)  -  Na’£(n)|k  <  »  ,  all  ct, 

I  £  «mall 

a,x,n 

and  similarly  for  the  'jump  numbers'  of  the  limit  process  X(  )  . 

Remark  on  the  proof.  Refer  to  Figure  5.1.  Let  t£  denote  the  ith  time  of 
return  of  X£(  •)  to  the  outer  boundary  after  the  ith  time  that  the  control 
takes  the  process  to  the  set  [0,(Bj  -  A0)]  x  [0,(R2  -  Aq)].  One  can  readily  show 
that  for  any  60  €  (0,1)  ,  there  is  T0  >  0  such  that 
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(5.9)  sup  P{tf+1  -  tf  <  T0  |  data  up  to  tf  }  <  1  -  60  . 

W,i, 

•mall  € 


This  is  just  a  consequence  of  the  properties  of  W£()  ,  B£()  and  of  the  fact 

that  dUa,£()  =  0  on  the  intervals  of  interest.  With  (5.9),  it  is  not  hard  to 
show  that  all  the  moments  of  Na€(iT0+T0)  -  Na'£(iT0)  are  bounded,  uniformly 
in  i  and  «  and  in  the  initial  condition.  (Similarly,  for  the  X(  )  process.) 
This  yields  the  desired  result.  See  the  proof  of  Theorem  5.3  in  [7]  of  a  related 
result  for  a  problem  with  a  more  complicated  statistical  structure. 

The  optimality  and  ‘almost’  optimality  theorem.  At  the  present  time 
almost  nothing  is  known  about  optimal  or  6-optimal  (6>0)  policies  for  the 
X£(  )  .  This  is  one  of  the  basic  reasons  for  considering  suitably  adapted 
policies  which  are  ‘good’  for  X(  )  .  Unfortunately,  we  know  little  about  the 
optimal  or  6-optimal  policies  for  X(  )  .  Thus,  we  must  postulate  (in  (A5.2)) 
the  existence  of  a  6-optimal  policy  with  certain  smoothness  properties.  The 
assumption  appears  to  be  eminently  reasonable,  since  there  is  usually  enormous 


flexibility  in  the  smoothing  that  can  be  put  on  6-optimal  controls.  The 


numerical  results  obtained  via  the  methods  described  in  Section  6  satisfy  (a5.2) 
for  all  the  cases  tried,  in  the  sense  that  the  ’control  decision’  surfaces 
(discretized  for  the  numerical  calculation)  seem  to  have  the  required  properties. 
In  fact,  the  situation  in  Figure  5.1  is  more  or  less  typical,  in  the  sense  that 
some  continuous  deformation  of  these  decision  surfaces  is  usually  the  case. 

For  our  current  purposes,  it  is  best  to  view  the  path  X(-)  as  its  graph 
in  the  state  space.  The  uncontrolled  sections  are  the  graphs  of  the  paths  of  the 
uncontrolled  reflected  diffusion,  and  the  controlled  sections  are  straight  lines, 
each  one  (or  perhaps  part  of  one)  correspond  to  a  different  value  of  the  set  of 


indicators  P  =  (P01,?02,?1,?12).  In  a  sense,  (A5.2)  is  a  long-winded  and  formal 

way  of  saying  that  the  lengths  of  the  straight  line  segments  are  piecewise 

continuous  in  their  starting  point.  It  also  deals  with  the  possibility  that  the 

initial  P  might  be  inappropriate  for  the  initial  state  x,  and  that  we  might 

have  to  change  the  control  settings  instantaneously  at  t  =  0.  We  tried  to  give 
a  general  description  of  what  reasonably  seems  to  be  expected.  The  situation 
might  be  simpler  in  special  cases  -  but  it  seems  likely  that  the  useful  6-optimal 
(or  even  optimal)  control  policies  would  be  described  by  (A5.2),  due  to  the 
nature  of  the  impulse  sequences.  Note  that  (the  ka  are  the  cost  coefficients 
in  (2.6)) 

1  +  sup  [V(x,P)  +  l]/min  k  =  K. 
x,P  ° 

is  an  upper  bound  for  the  numer  of  'simultaneous  impulses'  (the  above  number 

of  sequential  line  segments)  for  the  6-optimal  controls,  with  6  <  1.  We  know 

that  sup  V(x,P)  <  ®,  owing  to  the  properties  of  the  comparison  A0-boundary 
x,P 

control  of  Figure  5.1. 

We  require  some  'smoothness'  in  the  6-optimal  'feedback'  controls,  since  we 
need  to  adapt  them  for  use  with  the  X€(  )  process  and  will  require  that  the 

corresponding  sequence  (X£()}  (and  the  associated  costs)  converge 

appropriately  to  X()  (and  its  associated  cost). 

The  boundaries  of  the  sets  G(l)  and  Gj(F)  below  are  smooth  in  that 
they  are  composed  of  a  finite  number  of  differentiable  curves  which  arc  not 
tangent  at  the  points  of  intersection.  We  use  P  to  denote  the  control  value 
just  before  a  decision  to  change  the  control  is  made,  and  P t  to  denote  the 
new  control  value  just  after  the  decision  is  made.  Recall  that  P  -  1  is  used 
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We  could  replace  (A5.2)  by  the  simpler  assumption  that  for  each  6  >  0 
there  is  a  6-optimal  admissible  policy  for  X()  and  admissible  policies 

n|  for  X£()  such  that  X£()  (under  n|)  ^  X(  )  (under  r?6),  and  the 
associated  costs  converge.  (A5.2)  simply  defines  a  reasonable  for  which 

this  can  be  done.  The  interiors  of  all  sets  in  (A5.2)  are  relative  to  G  =  [0,Bj] 
x  [0,B2]. 

A5.2.  For  each  6  >  0,  there  is  a  'feedback'  policy  for  X(  )  which  is 

b-optimal  in  the  sense  that  it  satis fies  (A2.1)  and 

(5.10)  V(x,P)  =  inf  V(n,x,P)  >  V(n&,x,P)  -  6 

7T»dm 

for  all  x,P  and  which  has  the  following  properties. 

(a)  Let  P  =1.  Then  there  is  a  decision  set  G(l),  whose  boundary  is  divided 

into  a  finite  number  of  segments.  Each  segment  is  associated  with  a  switch  to 
some  P  j  *  1  when  X()  hits  it  from  the  outside.  The  segment 

associated  with  each  P x  is  strictly  interior  to  one  of  the  sets  G^Pj)  below. 

(b)  For  each  P  *  1,  there  are  a  finite  number  (perhaps  zero  -  see  remark  in 

(c)  below)  of  sets  G^P)  whose  interiors  are  disjoint.  If  x  e  G {(P)  and 

P  is  used,  then  it  is  used  until  the  boundary  of  G^P)  is  reached.  The 
distance  (taken  by  the  graph  of  X(),  which  is  a  straight  line)  from 
x  €  Gj(P)  to  the  boundary  of  G[P)  is  a  continuous  function  of  x.  The 
(straight  line)  graph  is  (uniformly)  not  tangent  to  the  boundary  at  any  point 
of  contact.  The  boundary  is  divided  into  a  finite  number  of  segments,  each 
associated  with  a  new  control  setting,  perhaps  with  P  ■  1. 


These  segments  are  strictly  interior  to  some  set  G^/1,)  fur  the  new  value 


£ 


At  the  corners  of  the  segments  of  dG^P)  or  6G(1),  any  policy  associated 
with  the  intersecting  segments  can  be  used.  There  is  Aj  >  0  such  that  after 
a  finite  number  of  switches,  we  have  P  «  1  and  X(  )  is  a  distance  * 
Aj  from  G(  1 ). 

(c)  It  is  possible  that  there  will  be  an  immediate  change  P  -  some  Px  *  P 

at  t  =  0.  If  this  occurs,  we  want  the  line  segment  of  the  graph  of  X(  •) 

after  the  switch  to  correspond  to  P j  for  at  least  a  minimum  distance 

independent  of  x.  (This  seems  to  be  rather  unrestrictive ).  We  formalize  this 
as  follows. 

(Cj)  If  we  do  not  switch  at  t  *  0,  then  assume  that  x  €  some  G fP)  above. 

(c2)  If  we  do  switch  (to  some  Pl  *  P)  at  t  *  0.  Then  assume  that  x  6  some 

GjfPj)  above  and  inf  dfx.dG^Pj)]  >  0. 
x€Gi(/,1) 

Remark.  The  assumption  concerning  ’points  in  common’  to  several  dGfP)  does 
not  seem  to  be  restrictive.  Generally,  in  dynamic  programming,  when  the  state 
is  on  the  boundary  of  sets  corresponding  to  different  policies,  any  one  of  the 
policies  is  optimal.  Condition  (A5.2)  is  intended  to  be  illustrative  of  the 


possibilities  that  we  can  allow. 


Adapting  n8  to  X€().  By  adapting  the  policy  n6  for  use  with 
X€(  )  we  simply  take  as  the  moments  of  decision  the  moments  when  Xf(  ) 
hits  the  decision  boundary  segments. 


Wkd 
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Wc  now  prove  the  ’almost  6-optimality’  of  n6-applied  to  X€(  •).  Theorem 
5.4  says  essentially  that  a  ’nice’  control  which  is  almost  optimal  for  X(  )  will 
also  be  almost  optimal  for  X£().  This  justifies  the  use  of  the  limit 
approximations  for  purposes  of  getting  good  or  nearly  optimal  controls. 

Theorem  5.4.  Assume  (A2.2)  to  (A2.6),  (A5.1)  and  (A5.2)  .  Let  n|  denote  the 
policy  of  (A5.2)  adapted  to  X£(  )  .  Then 

(5.11)  V£(7l|,x,/>)  —  V(n6,x,/>) 

uniformly  in  x  .  For  admissible  n£  and  small  £  . 

(5.12)  sup  sup  [Vt(n|,x,/>)  -  V((n(,x,P))  <  26  . 

(rr£ } 

Proof.  The  proof  is  a  consequence  of  the  weak  convergence  in  Theorems  5.1 
and  5.3,  the  piecewise  continuity  properties  of  (A5.2)  and  an  estimate  of  the 
type  obtained  in  Theorem  5.2,  and  we  only  outline  some  of  the  argument. 

(a)  The  facts  that  the  segments  of  G(l)  are  piecewise  differentiable  with 
non-tangent  corners  and  that  the  uncontrolled  X(  )  is  non-degenerate  imply 
that  the  hitting  times  (and  locations)  of  X£()  on  G(l)  converge  to  those 
for  the  limit  X(  •),  for  any  initial  condition  outside  G(l). 

(b)  Similarly  for  the  hitting  times  and  locations  of  the  boundaries  of  the 
Gj (P),  when  P  #  1. 

(c)  The  uncontrolled  segments  of  X£(  )  converge  to  those  of  X(  •).  The 
graphs  of  the  controlled  segments  of  X£(  )  converge  uniformly  to  their  limit 
straight  line  segments,  as  discussed  in  the  remark  after  Theorem  5.1. 

(d)  If  a  limit  point  of  X£()  or  a  limit  point  of  an  end  point  of  a 
segment  of  the  graph  during  a  control  interval  -  is  on  a  corner  of  the  boundary 


of  G(!)  or  of  some  G t(P),  then  the  limit  control  actions  just  after  contact 

with  the  boundary  there  is,  of  course,  specified  by  the  limit  of  the  control 

actions  of  X£()  just  after  contact  with  the  boundary.  But,  by  (A5.2)  which 
of  the  actions  associated  with  that  boundary  point  are  used  for  X()is 

irrelcvcnt.  Whatever  it  is,  it  will  be  used  for  a  positive  minimum  distance  (on 
the  graph). 

(c)  Let  N€(t)  denote  the  number  of  distinct  control  actions  on  [0,t], 

Then  a  proof  such  as  would  be  used  to  prove  Theorem  5.3  together  with  the 

weak  convergence  and  the  fact  that  Aj  >  0  can  be  used  to  show  that  (N£(n  +  1) 
-  N£(n),  n  <  “,  €  >  0}  is  uniformly  integrable.  (This  is  then  used  as  in 

Theorem  5.2.). 

(f)  Let  t  index  a  weakly  convergent  subsequence.  The  limit  process  is 
the  X(  )  associated  with  7tg.  By  (A5.1)  and  (A5.2),  the  particular  sequence 
used  is  irrclevcnt. 

(g)  (5.11)  follows  from  the  above  facts  and  theorems  5.1  and  5.2. 

(h)  (5.12)  follows  from  the  theorem  5.2  and  the  fact  that  tt6  is 
6-optimal  for  X(  ).  The  limits  of  the  controls  {n£}  might  depend  on  the 
subsequence.  But  (5.12)  holds  uniformly  in  the  subsequence. 

Extensions.  The  arrival  and  service  time  sequences  can  each  be  correlated, 
(e.g.,  service  in  ’random  batches’,  etc.),  provided  that  they  satisfy  suitable  mixing 
conditions.  If  they  are  correlated  and  state  dependent,  then  the  ’first  order 

perturbed  test  function  method’  of  [5,  Chapter  5]  (see  also  [6])  can  be  adapted. 

It  is  possible  to  control  the  service  or  arrival  rates  (marginal  a‘,d')  also. 
Impulsive  controls  (hence  piecewise  constant  rates)  are  easy  to  accomodate  here. 


Otherwise,  one  can  introduce  relaxed  controls  as  in  [6],  writing  (e.g.)  the  drift 
term  as  J‘b‘(X€(s),a)mt(da)  where  mt(  )  is  the  measure  associated  with  the 
relaxed  control.  We  do  need  to  maintain  heavy  traffic,  of  course.  The 

variances  can  also  be  allowed  to  be  control  dependent.  There  is  no  problem 
allowing  this  ’impulsively’,  but  for  continuously  controlled  variances,  there  is 
still  some  uncertainty  concerning  the  appropriate  description  of  the  limit 

problem. 

For  more  general  feedforward  -  branching  networks,  controlling  the  p^ 
might  also  be  of  interest.  One  could  use  pfj  =  pSj  +  /r  fcp^  +  o(/r).  Then, 

when  the  ’principal  terms’  are  cancelled  in  (3.5),  we  are  left  with  an  additional 
0(1)  term-depending  on  {Sp^},  and  this  corresponds  to  an  additional  drift 

associated  with  the  ’marginal’  control  of  the  routing.  Various  types  of 
controlled  priority  service  are  possible  -  and  might  be  the  subject  of  a  future 
paper.  For  example,  the  customers  might  fall  into  various  priority  classes  which 
relate,  for  example,  to  service  time  distributions.  We  might  control  the  priority 
service  subject  to  holding  costs  depending  on  the  priority. 

The  average  cost  per  unit  time  problem  is  trickier,  but  one  can  adapt  the 
scheme  for  the  ergodic  problem  in  [6].  Here  (X£(  ),  vector  of  elapsed  times 
since  the  last  service  completions  or  arrivals)  would  replace  the  vector  (Xc(  ), 
5{()}  of  [6].  Then,  under  appropriate  crgodicity  conditions  concerning  the 
6-optimal  processor,  we  can  extend  Theorem  5.4. 


6.  A  Numerical  Method  for  Approximating  the 
Optimal  Value  Function  and  Control 


The  control  problem  defined  by  the  cost  (4.8),  system  (3.14)  and  the 
control  actions  described  by  the  possibilities  associated  with  the  off/on  impulses 
associated  with  the  discussion  about  Figure  4.1  can  be  approximated  by  the 
numerical  methods  studied  in  [9].  The  method  in  [9]  involved  a  Markov  chain 
(indexed  by  a  ’finite  difference’  approximation  parameter)  approximation  to  the 
optimal  continuous  time  problem.  One  then  showed  that  the  sequence  of  value 
functions  for  the  chains  converged  to  the  optimal  value  function  for  the 
continuous  parameter  problem,  and  that  suitable  continuous  parameter 
interpolations  of  the  chain  converge  weakly  to  the  optimal  controlled  continuous 
parameter  process.  The  methods  of  [9]  can  be  readily  adapted  to  our  problem, 
and  only  an  outline  will  be  given.  The  weak  convergence  methods  used  in  [9] 
will  have  to  be  replaced  by  the  methods  here  -  owing  to  the  reflection  term, 
but  the  general  idea  is  the  same. 

Let  h  be  a  finite-difference  approximation  parameter,  and  IL  be 
integral  multiples  of  h.  Let  Gh  denote  the  h-grid  on  G  *  [0,BJ  x  [0,B2], 
Define  a;.  by  E^t)  =  Jgajj(X(s))ds,  and  generally  omit  the  x-argument  in  the 
ajj(-)  and  b‘(  )  below.  For  the  Markov  chain  approximation,  the  status  of  the 
controls  at  any  time  is  defined  by  the  vector  P  «  (P01,?02,?1,?12),  where  Pa  ** 
1  (resp.,  0)  denotes  that  the  control  is  on  (the  link  is  operating  normally) 
(resp.,  closed).  Recall  that,  when  P  -  (1,1, 1.1),  we  write  P  -1. 

Let  {X£}  denote  the  approximating  Markov  chain,  and  let  x  denote  the 
canonical  current  state,  y  the  canonical  successor  state  and  P1  the  canonical 
control  which  will  be  used  at  state  x  to  bring  the  chain  to  the  next  state. 
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Define  Xh(),  the  interpolated  process  to  be  the  right  continuous  piecewise 
constant  process  with  interpolation  intervals  Ath(x,/)1).  Both  these  intervals  and 
the  transition  probabilities  ph(x,y//>1)  depend  on  the  new  chosen  control  as 
well  as  on  the  current  state.  If  P 1  *  1,  we  use  At^x,/^)  =  0;  i.e.,  the 
interpolation  interval  has  zero  length.  In  this  case,  several  steps  of  (X£)  all 
occur  simultaneously  in  the  interpolation  Xh(  ).  Define  Qh(x)  =  2[au  +  a22  - 
|  3  12  1 1  +  h(  | b1 1  +  |b2|),  and  let  ai;  -  ja12|  >  0,  i  =  1,2.  For  Px  =  1,  w'e  use 
AtNx,/^)  =  h2/Qh(x). 

We  now  define  the  transition  probabilites  p^x.yjPj)  for  the  chain  when 
P j  =1,  for  x,y  €  Gh.  Let  es  denote  the  unit  vector  in  the  ith  coordinate 
direction.  We  use 

ph(x,x±ejh  | Pj  =  1)  =  [a(j  -  |a12|  +  hfb^j/QJx), 

(6. 1 ) 

ph(x,x+e^i -e2h  1)  =  ph(x,x-ejh+e2h  | Px  =  1) 

*  |«»|/Qh<*X 

If  some  x1  (the  ith  component  of  x)  equals  zero  -  then  the  transition 

probability  (6.1)  is  modified  as  follows,  as  a  concatenation  of  two  transitions, 
the  first  being  (6.1).  For  the  second  (the  ’reflection’)  step,  we  distinguish  two 
cases. 

Case  1:  The  ’y’  argument  in  the  ph  in  (6.1)  is  not  in  Gh,  but  x1  *  0  or 

y  *  x  -ejh  +  e2h.  Then  simply  project  (reflect)  the  process  back  to  the  nearest 

point  in  Gh. 

Case  2:  Let  x1  =  0  and  y  *  x  -ejh  +  e2h.  Then  the  second  transition  is 

back  to  y  =  x,  with  a  probability  p12/( I  -  pn)  and  back  to  y  =  x  +  e2h 
with  probability  [1  -  P12/(l  -  pn)j.  This  step  is  to  account  for  the  P12XJ 
term  in  (3.14). 


If  Pj  =1  always,  then  Xh(  )  =*  X(),  uncontrolled  and  unreflccted  [9], 

Let  P  denote  the  control  used  to  get  the  current  state  x.  The  actual 
state  for  the  problem  is  the  pair  (x,P),  since  the  cost  associated  with  the  next 
transition  depends  on  whether  or  not  some  element  of  the  current  control  vector 
is  changed.  Let  Kh(x,P,P1)  denote  the  costs  associated  with  the  transition, 
when  current  state  is  x,  and  control  P  changes  to  P v  For  Pl  =  1 

Kh(x,P,l)  =  Ath(x,l)k(x),  the  holding  cost  only. 

We  now  define  some  of  the  transition  probabilities  and  costs  when 
Px  *  1.  There  are  15  possibilities,  and  only  some  typical  ones  will  be  described. 
These  are  constructed  so  that  the  limit  (as  h  -*  0)  of  Xh()  will  be  the 
reflected  controlled  X(),  and  so  that  the  associated  costs  for  Xh(  )  will  also 
converge  to  that  for  X().  Write  P  =  (P01,?02,?12,?1),  P,  =  (P°\P°2,P J2,Pj). 

Let  P°‘  *  0,  with  other  P“  =  1.  Then  use  ph(x,x-exh  |  Px)  =  1  (by 
(A2.1),  x1  >  0  here)  and  K.h(x,P,Pj)  =  q01h  +  ^oi^p01^  P01=o)-  Now,  let 

P°2  =  0  with  other  P®  ■=  1.  Then  ph(x,x-e2h  j Pj)  =  1  and  Kh(x,P,Pj)  =  q02h 
+  kO2Ijp02=1  p02=0}-  For  Pj2  =  0  and  other  P®  =  1,  we  have  ph(x,x-e2h  j Px) 
=  1  and  Kh(x,P,Pj)  =  q12h  +  k12I (pt2=1  pn=0}. 

Now,  let  Pj  =  0  with  other  P®  =  1.  Let  p12gdl  4  8ai  (the  reverse  case 
is  treated  anologously)  and  refer  to  Figure  6.1.  The  line  from  x  to  (a)  is  the 
mean  direction  of  the  appropriate  impulse,  and  its  slope  (see  Section  4)  is  [gs2  - 
(1  -  P2j)8d2]/8ll]  =  'PuSdi/Sai •  1°  order  to  ’simulate’  this  mean  line,  we  use 

ph(x,x  +  exh  -  e2h|P1)  -  Pugd^g.i  =  1  -  Ph(x,x  +  e2h  | P,). 

The  instantaneous  cost  is  Kh(x,P,Px)  «  k1I^pi_1  pi_0j. 

Now,  let  Pj2  =  Pj2  ■=  0  with  all  other  P®  =  1.  Then  ph(x,x  -  e2h|Pj) 

=  1.  The  ’impulsive’  part  of  Kh(x,P,Px)  is  obvious,  namely  k12l^pi2=1  pi2_p^  + 


k02I {p02  =  i  p02=o}’  But  l*ie  ’opportunity’  cost  -  that  due  to  Z12  and  U02  is 
less  obvious.  This  is  obtained  from  the  relative  rates  at  which  X2()  decreases 
due  to  the  effects  of  P,2  and  P02  (resp.)  being  off.  This  is  (resp.)  p12gdl 
and  ga2.  Thus  we  use  the  ’opportunity’  cost 

b[Qi2Pi28dl  +  ^02®02^/^12Sdl  +  8a2)- 

The  Ph(x,y|fj)  and  K.h(x,/>,P1)  are  calculated  in  a  similar  way  for  all  the 
other  possibilities. 

The  dynamic  programming  equation  for  our  ’approximation’  problem  is 

(6.2)  Vh(x,/>)  =  min  [(exp  -  BAth(x,Pj))E  ph(x,y  | P1)Vh(y,P1) 

P t 

+  K  h(x,P,/>1)]. 

The  weak  convergence  methods  of  this  paper  can  be  used  to  show  that  Vh(x,P) 

-  V(x,P)  =  inf  V(x,n).  It  can  be  shown  that,  for  each  x  there  is  an  (u,t)- 
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dependent  control  such  that  the  approximation  methods  (for  the  control)  in  [9, 
Chapter  9]  can  be  used.  For  reasonable  grid  sizes,  say  50  x  50,  the  numerical 
problem  is  quite  tractable. 

For  the  numerical  problem,  we  do  not  need  to  duplicate  the  dynamics  of 
the  original  system  X€(  ),  but  we  can  use  any  controlled  process  which  has  the 
same  controlled  limit  equation.  See  the  book  [9]  for  a  fuller  development  of 
this  computational  point  of  view  for  a  large  class  of  more  classical  problems. 
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